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Abstract 

In many high energy experiments, the physics quantities are obtained by mea¬ 
suring the cross sections at a few energy points over an energy region. This was 
referred to as scan experiment. The optimal design of the scan experiment (how 
many energy points, what the energies are, and what is the luminosity at each en¬ 
ergy point) is of great significance both for scientific research and from economical 
viewpoint. Two approaches, one has recourse to the sampling technique and the 
other resorts to the analytical proof, are adopted to figure out the optimized scan 
scheme for the relevant parameters. The final results indicate that for n parameters 
scan experiment, n energy points are necessary and sufficient for optimal deter¬ 
mination of these n parameters; each optimal position can be acquired by single 
parameter scan (sampling method), or by analysis of auxiliary function (analytic 
method); the luminosity allocation among the points can be determined analyti¬ 
cally with respect to the relative importance between parameters. By virtue of the 
second optimization theory established in this paper, it is feasible to accommodate 
the perfectly optimal scheme for any scan experiment. 

PACS : 87.55.de; 87.55.kh; 13.66.Jn 
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1 Introduction 

Scan method is a useful tool for various kinds of studies in domain of high energy 
physics. Firstly, it plays an important role in the discovery of new resonances, the most 
famous one is J/b that leads to “November revolution” in particle physics (TJ 2 \. Sec¬ 
ondly, scan experiments can provide lots of accurate information related to particles, such 

* E-mail:moxh@ihep.ac.cn 





as accurate measurement of the r lepton mass 13111], accurate measurement of the Z res¬ 
onance parameters !3 El HE!, and so forth. Thirdly, scan measurement can add a lot 
to understand the present theory, such as R-value measurement [Hi and phase angle 
measurement [TO], both of which are crucial for quantum chromodynamics researches. 

A scan searching experiment is always intriguing and exciting. However, the scan 
scope is usually unpredictable large, since no one knows where the new particle will 
jump up. Therefore, it is fairly reasonable to show respect to pioneers for their bravery 
and diligence in looking for a needle in a haystack. Anyway, the situation changes into 
another direction step by step. Nowadays, more and more particles are discovered, and 
luminosity of accelerator becomes higher and higher, the scan experiments begin to play 
a new role in physics study, especially for the high precision measurement. The high 
precision measurement will help us to understand the existing theory more profound, and 
also helpful for the new discovery during the progress of accuracy improvement. However, 
since scan experiment is usually performed at many energy points, the optimal choice of 
energy position and luminosity distribution at each point becomes a more and more 
prominent issue, which directly relates to the efficiency of data taking procedure. 

Scan optimization is not a trivial affair. For example, during the statistical optimiza¬ 
tion study for r mass scan, it is found that one energy point is enough for one parameter 
fit m Further more, the successive studies mmm indicate that for n free parame¬ 
ters fit, n scan points are enough to give optimal results. As a matter of fact, the fewer 
the points, the more efficiently the accelerator works, since lots of tuning time can be dis¬ 
pensed with. On this extent, the optimization theory that figures out the minimal number 
of points is of great importance for practical data taking design of scan experiment. 

The theory of second optimization for scan experiment, which is depicted in follow¬ 
ing sections, will accommodate perfect scheme for scan experiments that aim at accurate 
measurements of interesting parameters. This paper begins by in Sect. [2] providing the 
concept of the second optimization that is the kernel of following study. The sampling 
method is adopted to explore optimal scan scheme in Sect. [3] where r mass measurement 
is used as a concrete example. In Sect. [4] the analytical theory of second optimization is 
established on the basis of elementary knowledge about numerical optimization. Section [5] 
devotes to some discussions involving the equivalence between likelihood and chisquare 
fits, optimal effect due to systematic uncertainty, correlation problem of systematic un¬ 
certainty, multiple solution issue related to the objective function, merits of the sampling 
method and the analytical theory. Finally, key conclusions are summarized in Sect. [6] 


2 Notion of Second Optimization 

The chisquare form for scan experiment reads 



where i denotes the i-th scan point, and the total number of scan points is m. N is the 
number of events that is classified into two categories: the observed number of events 
(. N obs ) and the theoretical number of events ( N th ). The relation between event number 
(N), luminosity (L), efficiency (e), and cross section (a) is expressed as 

N = Lea . (2) 

Generally speaking, for scan within large energy scope, the efficiency is energy dependent 
and distinctive at different scan point; for comparatively small scan scope, such as r mass 
scan, J/-0 and i/}' narrow resonances scan, the efficiency can treated as a constant, that is 

€i — e, i — 1, 2, ■ ■ ■ , m. (3) 

Such an assumption has essentially no effect on general conclusions obtained in this paper, 
and is always assumed in the study that follows. L* denotes the luminosity at the i-th 
point, the relation between L t and total luminosity (L) is as follows 

m 

Li = XiL , with Xi = 1. (4) 

2=1 

Here Xi denotes the luminosity allocation at point i. A obs is the error of the observed 
number of events. As to a Poisson distribution, 

A ° bs = , (5) 

the form of which is adopted in the following study. 

The observed cross section can be measured through the observed number of events 
by relation ()2|) . The theoretical cross section is usually acquired on the basis of present 
theoretical calculations that involve some parameters, which can be obtained by fitting 
experimental data. Mathematically, 

a th = o(9) 1 d=(diA,--- A) T (6) 

Here 6 is the parameter vector, there are totally n parameters. T indicates transpose 
of vector or matrix. In addition, the luminosity allocation vector is also introduced and 
defined as 

x = (xi,x 2 , ■ ■ ■ ,x m ) T . (7) 


O 




For convenience, the observed cross section is denoted as a, that is a — a obs \ the theoretical 
cross section is denoted as a or cr(9), the latter is used to stress the dependence of cross 
section on parameters. In a word, y 2 can be recast as 

m 

X 2 (d, x) = Le ■ ^ ~r [&i ~ • (8) 

In above expression, parameters 9 and x denote the optimal problem we want to study. 
For certain x , the minimization of the x 2 leads to a set of optimal parameters, which 
are denoted as 9*. This optimal process is the usual one in experimental data analysis, 
which is called the first optimization. Obviously, errors of 9* depend on the values of x. 

rri 

Therefore, under the constraint of certain total luminosity or Y2 x i — 1, the optimization 

i =1 

on x is performed in order to obtain the smallest errors of 9*. This optimization on x is 
called the second optimization. 

In the following sections, the sampling method is utilized to study the second opti¬ 
mization firstly. The r mass scan is taken as an example, due to the simplicity of which, 
the essence of the sampling method is exhibited pedagogically. Then, the analytical the¬ 
ory based on optimization principle is established, which settles the issue of scheme design 
for scan experiment thoroughly and perfectly. 

A remark is in order here. For the r mass scan, the conventional likelihood estimator 
is adopted, which is equivalent to the chisquare estimator for the first optimization (refer 
to subsection 15.Ill . As far as the second optimization is concerned, whichever form of 
estimator is chosen is actually irrelevant, since they are only relevant to the first opti¬ 
mization. 


3 Sampling Method 


For the r mass (m T ) scan, several points, say totally N pt points need to be taken in 
the vicinity of m T threshold. By virtue of analyzed data, the following likelihood function 
is constructed ® a nans!: 


Npt AT. ... 

—r U P 


(9) 


where Ni is the observed number of t + t~ events obtained by e/i-tagged final state (here 
the efi channel means r + —> e + v e v T1 r~ —> or r + —> n + v p v T , t~ —t e _ P e i/ T ) at f-th 

scan point. Here N t is assumed obeying a Poisson distribution, whose expectation [i l is 
given by 

A k(m T ) = [e • B ep ■ a obs (m T , E l cm ) + <t B g\ ■ A • (10) 


In Eq. (gOP , L t is the luminosity at the f-th point; e is the overall efficiency of e/j, final 
state for identifying r + r~ events, which includes trigger efficiency and event selection 





efficiency; B efl is the combined branching ratio for decays r + —* e + v e y T and t~ —> 
or the corresponding charge conjugate mode; a obs (with m T as a parameter), which can be 
calculated by the improved Voloshin’s formulas CGI. is the observed cross section measured 
at point i with center-of-mass energy E l cm ] and <jbg is the total cross section of background 
channels after t + t~ selection. If m T is set as a free parameter, the minimization of LF 
in Eq. (J9J) yields the best estimation for m T . 

Besides m T , e and cjbg can be free parameters as well. The sampling technique is 
utilized to figure out the optimal scan scheme for one-(m T ), two-(m T and e ), and three- 
(m T , e, and <Jbg) parameter fit step by step [TTJ [12] . 

3.1 One parameter optimization 

Herein to achieve high precision of m T we want to fold out: 

1. What is optimal distribution (position) of data taking points; 

2. How many energy points are needed for scan in the vicinity of threshold; 

3. How much luminosity is required for certain precision expectation. 

In the following study concerned with statistical uncertainty, taken are efficiency e = 
14.2% pT8] , energy spread A = 1.4 MeV [IS], B efl = 0.06194 [T9] , and neglected are 
the corresponding uncertainties whose effects are generally small HU- As to ctbg, the 
previous experience [?] indicates that <7bg ~ 0.024 pb which is fairly small comparing 
with the t + t~ production cross section (0.1 nb) near threshold. Moreover, for a high 
luminosity accelerator, a large data sample can be taken below the threshold to measure 
<jbg accurately. In actual fit as a constant, &bg has tiny effect on the optimization of 
points distribution. Therefore, for one parameter optimization, Gbg is set to be zero, 
which means that the study is background free. 

In the following exploration, the value of m T itself is assumed to be known, which is 
set to be = 1776.99 MeV according to PDG06 [19], and under such an assumption, we 
attempt to answer three above questions. Nevertheless, when think twice about the first 
two questions, it is observed that they actually intertwist with each other, i. e. the optimal 
number of points depends on the distribution of points and vice versa. To resolve such a 
dilemma, we start from a simple distribution and fold the optimal number of points, then 
based on which we finally determine the number of points. 

3.1.1 First searching 

As a tentative beginning, the energy interval to be studied is divided evenly, viz. 

Ei = E 0 V (i — 1) x 8E, ( i = 1,2,..., N pt ) 


( 11 ) 


point Eq = 3.545 GeV, the final point Ef = 3.595 GeV, and the fixed 
Eo)/N pt with N pt being the number of energy points. For a given total 
is also apportioned averagely at each point i.e. = L/N pt . 



Figure 1: Flow chart of sampling simulation, where i (i — 1, 2, • • • , N pt ) indicates certain 
scheme and j (J — 1,2, ■ ■ ■ , N samp ) sampling times. 


where the initial 
step 5E = (Ef — 
luminosity ( L ) it 


For each special scheme (that is for each N pt ), in order to reduce statistical fluctuation, 
the sampling is repeated many times (the sampling times is denoted as N samp ), the average 
value and corresponding variance of the fit out variables are worked out as follows [20J : 


N b , 


X' = 


N, 


samp - =l 


E x h 


( 12 ) 


^ Nsamp 

S 2 x (X') = T, -TT E X - X 2 . (13) 

■N samp 1 ■ _ i 

where X denotes the free fitting parameter which can be m T , e, and/or obg- Here it should 
be noted that i indicates the certain scheme, whose value can be 1 while j indicates the 
sampling times which equals to 200 in the following study. Without special declaration, 
the meaning of the average defined by Eqs. ca and (fT3l) will be kept in the study that 
follows. The general flow chart of sampling and fitting research is presented in Fig. [1] 
For one-parameter optimization, X = m T and N samp = 200, using the experiment 
parameters given in the previous section, e, A, and B efl , setting L = 30 pb -1 , and N pt 
ranging from 3 to 20, the fitted results are shown in Fig. [2](a). 

Here S mr is the corresponding uncertainty of fitting value of m T , which is adopted 
to assess the quality of fit, in another word, the smaller S mT the better is the fit. It is 
prominent that too few data taking points lead to large uncertainty while too many points 
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(a) S mT against N pt (b) Two kinds of distributions 

Figure 2: (a) shows the variation of S mr against the number of points N pt . (b) shows the 
distributions of data taking points with the smallest and greatest S„ 1t denoted by stars 
and diamonds, respectively. The solid curve is the calculated observed cross section, and 
the dashed line the corresponding derivative of cross section to energy with a scale factor 
1(T 2 . 


have no contribution for precision improvement either. As indicated in Fig. [21(a) , = 5 

is taken as the optimized number of points for the evenly-divided-distribution scheme. 


3.1.2 Second searching 

With five points, we want to further search for the distribution which can afford 
us the small fit uncertainty. As without any theoretical or empirical guidance, various 
possibilities are tried by employing the sampling technique, that is the energy points is 
taken randomly in the chosen interval. For 200 times sampling, singled out are two fit 
results with the smallest (S m . r = 0.152 MeV, denoted by stars) and greatest (S mr = 1.516 
MeV, denoted by diamonds) fit uncertainties; their distributions are shown in Fig. [2](b), 
by virtue of which it is obvious when the points crowd near the threshold the uncertainty 
is small; on the contrary, when the points are far from the threshold the uncertainty 
becomes large. More mathematically, it is found that the smallest uncertainty is acquired 
when points gather at the region with the large derivative of cross section to energy. So 
this result implies that the region with large derivative is presumably the optimal position 
for data taking. We try to prove this speculation next. 

To hunt for the sensitive position, two regions are selected as shown in Fig. [3](a): the 
region 1 (E cm c (3.553, 3.558) GeV ) is selected with the derivative falls to 75% of its 
maxinum while the region 11 ( E cm C (3.565, 3.595) GeV ) is selected with the variation 
of derivative is comparatively smooth than that in region I. 

To ascertain the aforementioned speculation, two schemes are designed. For the first 
scheme, two points are taken in the region /, one at 3.55398 GeV as the threshold point 
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(b) results for different scheme 


Figure 3: (a) Two subregions, denoted by I and II, with different derivative feature where 
the solid line denotes the observed cross section and the dashed line the corresponding 
derivative value with a scale factor of 10 -2 ; (b) the fit uncertainties for different schemes, 
crosses and diamonds denote respectively the results for the first and second schemes as 
depicted in the text. 


and the other at 3.5548 GeV corresponding to the largest derivative point. As in the region 
II, the number of points N pt increases from 1 to 20, with each point having luminosity 5 
pb _1 . The fit results are displayed in Fig. [3jb) by crosses. Clearly, the increase of points 
in the region II hardly has the contribution to accuracy improvement ( S mr = 0.25 MeV 
remains almost the same with the increasing number of points in region II). That is to 
say, the region / is the sensitive region so far as the fit uncertainty is concerned while the 
region II is not. To prove this point further, for the second scheme, merely the points in 
the region II are taken, N pt also increases from 1 to 20. The fit results are displayed in 
Fig. [3](b) by diamonds. As expected, with the increasing number of points, S mr decreases 
as well, but even with 20 points in the region II the value of S mT = 0.73 MeV is still 
much larger than that with solely two points in the region I. Therefore it is concluded 
that the points within the region I are more useful for optimal data taking. 


3.1.3 Third searching 

In this subsection, the first thing needed to be known is how many points are optimal in 
the region with large derivative. As the procedure in subsection l3.1.11 the total luminosity 
L — 45 pb -1 is rationed averagely into N pt points (N pt = 1,2, • ,6) within the energy 

region from 3.553 to 3.557 GeV. The results are shown in Fig. U^a), according to which 
the number of points has weak effect on the final uncertainty. In other words, within the 
large derivative region, one point suffices to give rise to small uncertainty. This is easy 
to understand since there is only one free parameter (m T ) needed to be fit in the t + t~ 
production cross section, one measurement will further fix the normalization of the curve. 
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The larger of the derivative, the more sensitive to the mass of the r lepton. 



(a) S rUt for different points 


(b) scan region from 3.551 (c) scan region from 3.552 
to 3.595 GeV to 3.5565 GeV 


Figure 4: The (a) shows the relation between S niT and the number of points within the 
energy region from 3.553 to 3.555 GeV. The (b) and (c) shows the variation of S niT against 
energy from one point fit with L — 45 pb -1 . In plot (b) the scan region is from 3.551 to 
3.595 GeV while in plot (c) the scan region is from 3.55330 to 3.55694 GeV. The solid 
line denotes the derivative of cross section with scale factor 10~ 3 and the dashed line 
the observed cross section with scale factor 10 -1 . The A and B denote respectively the 
positions with the smallest S mT and the greatest derivative of cross section. 


Since one point is enough, then an immediate question is where the optimal point 
should locate? To answer it, the scan with one point with the luminosity L — 45 pb _1 
is made and the results are shown in Fig. QJb). Just as previous study indicated, the 
small uncertainty is achieved near the peak of derivative. If looking into the region from 
3.5520 to 3.5565 GeV, refer to Fig. IHc), it is found that the smallest S mT = 0.105 MeV 
is obtained near the m r threshold (E cm = 3.55398 MeV), which has a deviation from 
the position (E cm = 3.55484 MeV) with the greatest derivative of cross section where 
Sm T = 0.122 MeV. In addition, the study also indicates that within 2 MeV region the 
variation of S rriT is fairly smooth (from 0.105 to 0.127 MeV), which is very favorable for 
actual data taking. 


3.1.4 luminosity and uncertainty 

The empirical formula of the relation between the fit uncertainty S mT and the given 
total luminosity L can be fitted based on the data provided in Ref. HU as follows 

„ ri 708.05 

Sm T [keV] = £0 . 504 [pb —1] » ( 14 ) 

which indicates that 49 pb” 1 is sufficient for a statistical precision better than 0.1 MeV. 







3.2 Multiple parameters optimization 


3.2.1 Position determination 


As we already note the optimal number of point depends on the distribution of points 
and vice versa. Under one-parameter fit case, we employ the sampling technique to take 
energy points randomly in the chosen interval, which in principle exhausts all possibilities 
and ensure the optimization of final scheme. However, such a method is infeasible for 
multiple parameters fit due to the increasing complex of fit. For example, it is found when 
two energy points are too close to each other, the fit always fails. Before the establish 
of analytical theory, it is expedient to adopt the “independence conjecture”, that is the 
optimization of one parameter is independent from the others. In actual operation, we 
fix the optimal positions which have been found, only variate one energy point for one 
parameter scan so that we can find the optimal position. When all optimal positions have 
been found, we try to investigate some possibilities to confirm the optimization of the 
figured out scheme. 
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(b )S rriT with point number (c) S m . r with point position 


Figure 5: (a) The variation of S mT and S e with the scan of the second energy point 
from 3.554 to 3.595 GeV. The small boxes and the small triangles represent S„ 1t and S e 
respectively. The solid line denotes the derivative of cross section with a scale factor of 
0.001 and the dotted line denotes the cross section with a scale factor of 0.1. (b) The 
relationship between error of m T and the number of data taking points, (c) The variation 
of error of m T and the location of energy. 


In the light of the results of one-parameter fit, the first point is fixed at r threshold 
(Ei = 3.55379 GeV) to determine the parameter m T . As to the new adding fit parameter 
e, the quantity S e is used to find the optimal position for E 2 with the increasing energy 
position. Figure [5])a) shows the distributions of S e with the variation of second point. 
It is obvious that S e decreases with the decreasing of derivative. Therefore, the optimal 
position of the second can be selected far from r threshold at the high energy side, for 
example E cm = 3.6 GeV. As a matter of fact, Figure [5](a) also shows the uncertainty of 





m T remains almost the same when the upper energy point greater than 3.58 GeV, which 
means the upper-limit of the scan is not crucial for m T determination, that is to say, the 
upper-limit could be selected with large freedom, such as 3.59, 3.595, 3.6, 3.605 GeV, and 
so on. Anyway, when the energy is greater than 3.65 GeV the effect due to the resonance 
of ^(3686) will exhibit [21). Therefore, the upper-limit of m T scan should be less than 
3.65 GeV. 

The conclusions listed here are easy to be understood. First, for one parameter, since 
there is only one free parameter (m T ) needed to be fit in the t + t~ production cross section, 
one measurement will fix the shape of the curve. Second, the fitted parameter will be 
sensitive to the variation of curve. Mathematically, the variation of curve is reflected by 
its derivative. So the sensitive point for m T will be in the region with large derivative. 
As shown in Fig. [3j two regions are selected: the region I (E cm C (3.553, 3.558) GeV ) 
is selected with the derivative falls to 75% of its maximum while the region II (E cm C 
(3.565, 3.595) GeV ) is selected with the variation of derivative is comparatively smooth 
than that in region I. In region I, the variation of derivative against the energy is fairly 
prominent which indicates such a region will be sensitive to the horizontal change (that is 
energy scale change). Therefore, region I is optimal for m T which is determined by both 
the shape of the cross section curve and the energy scale. Comparatively, the variation of 
derivative in region II is smooth so it is insensitive to the horizontal change but can be 
sensitive to the vertical change. That is to say, it could be expected that region II will 
be optimal for efficiency which determines the overall normalization of the curve. This is 
just the results displayed from the scan of S e . 

Based on results of the preceding section, two parameters m T and e can be de¬ 
termined by the optimized first and second points which are located respectively at 
Ei = 3.55379 GeV and E 2 = 3.6 GeV. As to the new adding fit parameter ctbgi we 
divide luminosity 20 pb _1 into 1, 2, 3, 4 or 5 point/points within the range from 3.50 to 
3.54 GeV (the luminosities for point 1 and 2 are L\ = 75 pb _1 and L 2 = 25 pb^ 1 , the 
reason for such a division refer to the next section), fit results are shown in Fig. G3(b). It 
can be seen that the number of points has almost no effect on the fit uncertainty of m T 
or in another word, one point (denoted as point 3 hereafter) is enough to determine the 
parameter (Jbg- As the second step, with the luminosity of 20 pb _1 for the third point, we 
perform the fit with E- ti = 3.50, 3.51, 3.52, 3.53, or 3.54 GeV, respectively. The relation 
between S mT and the energy position is shown in Fig. |5](c) which indicates that S mT is 
almost irrelevant to energy position, as long as it is below t + t~ threshold. This is also 
understandable since the cross section below threshold is always zero. Then, any position 
below threshold is feasible for <jbg determination. As an example, E$ = 3.50 GeV is 
chosen as the third point. 


3.2.2 Ratio determination 


Unlike one-parameter fit, besides finding the relation between luminosity and precision, 
it is also necessary to know the luminosity allocation among different points. As the first 
step, we begin from two parameters case. For certain total luminosity, say L = 120 pb” 1 , 
distinctive allocation schemes are checked and results are displayed in Fig. E(a). Just 
as expected, with the increasing of L 3 (decreasing of L 2 ), S mT (S € ) decreases (increases) 
correspondingly. The abnormal increasing of S mr (S e ) at extreme region where L 2 (L\) 
is almost zero, can be explained as the correlation effect between S mr and S e . By virtue 
of the curve from fitting the data in Fig. [61(a), the minimal value of S mT is achieved with 
L i = 75 pb -1 or equivalently L\ : L 2 — 3 : 1. Further checks indicate such a ratio is 
independent on the total luminosity [[12] . 



Figure 6: (a) The variations of S„ 1t and S e with the increasing of L 3 . (b) The relationship 
between the statistic error of m T and the luminosity at the background point. The total 
luminosity is fixed as 120 pb” 1 , the ratio of apportion luminosity at the first and the 
second point is 3 to 1. (c) The variation S mr with a two dimension scan of luminosity 
on the third point and the ratio of the luminosity allotted at the first and second points. 
The total luminosity is fixed at 120 pb -1 . 


As the second step, we fix the total luminosity as 120 pb” 1 and the ratio of luminosities 
between the first and the second points as 3 : 1, then increase the proportion of the 
luminosity allotted at the third point to find the dependence of S mT on the luminosity of 
point 3. As shown in Fig. [61(b), the smallest error S mr = 0.096 MeV is obtained when 
the luminosity equals 12 pb” 1 , which is about 10% of the total. In a word, L 3 = 10% • L 
together with Li : L 2 = 3:1 will lead to the optimal value of S mT . 

Some checks are performed to consolidate the obtained optimal scheme [T2j, one of 
which is shown in Fig. [6](c), where for the fixed total luminosity, say L = 120 pb” 1 , 
with L = L\ + L 2 + L 3 , a two dimension scan of S niT is performed with respect to 
Li/(Li + L 2 ) and L 3 . Clearly, for the fixed L 3 , the smallest S rriT is obtained at the value 





















Li/(Li + L 2 ) = 0.75 while for this fixed ratio, the smallest S mr is obtained at the value 
L 3 = 12 pb -1 , which is 10% of the total luminosity. In fact, the smallest S rriT can be read 
directly from the three-dimension plot, with the coordinates L 1 /(L 1 + L 2 ) ~ 0.75 and 
L 3 « 10% • L. 

3.2.3 Scan scheme 




Figure 7: Left penal: the optimal points for m T scan. Three solid circles denote the 
sensitive positions for parameter 3 ( ctbg ), 1 (m T ), and 2 (e) respectively. Right panel: the 
dependence of S mr on L for one-, two- and three-parameter fit strategies. Overlaid are 
fits of functions with the form A/L 0,504 , with A is a constant from fit. 

Resorting to the sampling technique, we finally fix the optimal scan scheme for highly 
accurate m T measurement. For three parameters fit strategy, the positions of three data 
taking points are shown in the left panel of Fig. [TJ. where three solid circles denote the 
sensitive positions for parameter 3 (er BG ), 1 ("v), and 2 (e) respectively. The luminosity 
allocation for these points are as follows: L 3 /L = 10%, Li : L 2 — 3 : 1, which can lead to 
optimal fit results for m T measurement. The relation between total luminosity (L) and 
the uncertainty of m T ( S mT ) is shown in the right panel of Fig. [71 from which it can be 
seen that if the precision at the level of 0.1 MeV is required, around 120 pb _1 data have 
to be taken. The actual experiment at BESIII just follows such a scheme 


4 Analytical Theory 

Now we return to Eq. (J8J). Introducing 


ri(9) = \l^-\ai - a 


CTi 


then the objective function / is constructed as follows 

m 

f(9) = r(9) T r(9) = Y^[ r i 


i=l 


(15) 


(16) 








Obviously, f{9) = x 2 {9,x ), symbol x is usually suppressed when only considered is the 
first optimization. The first and second derivatives of / are expressed as follows 


V/(0) = 2 h{9) [ gradient of f(9)) 

m 

= 2J2r i (9)Vr i (9) 

i =1 

= 2 J(9) T r{9) ; 

V 2 /(6*) = G(9 ) [ Hesse matrix of f(9 )] 

m 

= 2{E Vr i (9)Vr i {9) T + r,(0)V 2 r,(0)} 

2—1 

m 

= 2{zm T m+s(9)}-, 

2—1 


(17) 


(18) 


where 


J(9) = 
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(19) 


which is an m x n matrix; and S'(0) = rj(0)V 2 rj(0), which is simply the normalized resid¬ 
ual [23j. Statistically, the normalized residuals should be small, and scattered randomly 
around zero in the vicinity of minimization point of f{9 ); hence, on being summed, these 
terms yield a negligible contribution to the Hessian G{9). So we introduce the reduced 
Hesse matrix that is defined as 


H{9) = J(9) t J{9). (20) 

Next we adopt Gauss-Newton Algorithm to obtain optimal parameters : 

Gauss-Newton Algorithm 

1. Given initial parameter 9 0 , assign the precision oj \, (j0 2 , and set k = 0; 

2. Compute r k = r(9 k ), f k = f(9 k ); 

3. Compute 

h k = h(9 k ) = J(9 k ) T r(9 k ), 

H k = H(9 k ) = J(9 k ) T J(9 k ); 

4. Compute 9 k+1 = 9 k - H^ l h k ] 

5. Compute r k+1 = r(9 k+1 ), f k+1 = f(9 k+ 1 ); 

6. Check H-criterion, if it is satisfied, output r k+ 1 , f k +i, stop; otherwise, set k = k + 1, 
r k = r k+ 1 , f k = f k+ 1 , and go to step 3. 









The H-criterion is the so-called Himmelblau’s convergence criterion that is defined as 
follows: it is assumed that 9 k ,dk+i, fk, and fk+i are computed, uq and u 2 are given 

precisions, if || 0 fe || < uq and |/ fc | < uq, then use || 0 fc+ i - 9 k \\ < u 2 and \f k+1 - f k \ < u 2 

||0fc+l _ 9 k \ 


as convergence criteria; if || 0 fc|| > uq and \f k \ > uq, then take 
\fk+i — fk\ 


II Ok 


< u 2 and 


I fk 


< u 2 as convergence criteria. Herein, it is worthy to remind that k is the 


subscript for iteration index instead of that for vector component. In this paper no 
confusion results because which meaning is intended is always clear from the context in 
which it appears. 

Onr first task here is to prove the convergence of Gauss-Newton Algorithm. From now 
on, onr vim focuses on the second optimization, so the first optimization is always assumed 
to be feasible and solvable. Mathematically, the objective function is assumed to have 
fairly good analytical properties, such as continuity, differentiability, Lipschitz continuity 
over certain neighborhood, the positive definite of matrix, and so forth. Some subsidary 
mathematical materials are compiled in the appendix. Nevertheless, two lemmas that are 
needed as the prerequisites of convergence proof, are presented below. 

Lemma 1. Let A and B be n x n matrices, let A be nonsingular and ||A _1 || < a, let 
||B — A|| < (3, and let a/3 < l, then B is nonsingular and 

a 


I B 


-ii 


< 


1 — a/3 


( 21 ) 


Proof. The first step is to establish equality B 1 = (/ — A 1 B) k A 1 . 

k =o 

Notice (/ - A- l B) k = [A- 1 {A - B)] k , \\A~\A - B )|| < a(3 < 1 , 

oo 

whence — A~ l B) k = [/ — (/ — A~ l B)]~ l = B~ l A. Multiple both sides with A -1 , get 


k =o 


the needed result. The second step is to prove || (I — A 1 B) k A 1 || < --. 

1 — a 3 

k =o K 

OO OO II A—l || 

|| V(7- A^BfA^W < ||A- 1 |||| V(7- A~ l B) k \\ < - i< —— . 

11 ; 11 -. ; 11 - 1- \\I-A~ l B\\ ~ l-a/3 


k =0 


k =0 


□ 

Lemma 2. Let F : R n —> R m be a continuous and differentiable function over a open con¬ 
vex set D C R n , the derivative of F, i.e. F', is Lipschitz continuous over a neighborhood 
of any 6 € D, then for any 6 + 5 G D, we have 

||F(9 + i)-F(0)-eW||<|||i|| 2 , (22) 


where 7 is a Lipschitz constant. 



Proof. 


therefore 


F(6 + 6) - F(6) - F'(6)6 



F\9 + tS)5dt - F'{9)8 
[F'(9 + tS) — F\9)]8dt, 


\\F(9 + 5)-F(9)-F'(9)5\\ 


< [ \\F\9 + t8)-F'(9)\\\\8\\dt 
Jo i 

< [ tIMHMK 

! 

= m r/ ^ = |ii5ir. 


□ 


Theorem 1 (Convergence of Gauss-Newton Algorithm). If f : R n — » R has a continuous 
second partial derivative over a open convex set D C -R", J(6>) is Lipschitz continuous 
over D, i.e. || J(<f) — J{9) || < 7 || (j) — 6*11, V</>, 9 G D; and || J(0)|| < a, \/9 E D. If there 
exists a critical point 9* such that J(9*) T r(9*) = 0, then the sequence {9k} generated by 
Gauss-Newton Algorithm converges to 9*. 

Proof. Herein simplified symbols are introduced for the following proof, 

n _ (nk nk pk\T n* _ ( n* n* / 3 *\T. 

Vk ~ V 2 , ■ ■ ■ , U n ) ,U — ftR, C/ 2 , ' ’ ’ , V n ) , 


J = J(9), J k = J(9 k ), J* = J(9*),r k = r(9 k ),r* = r(9*). 
By Jjr* — 0 and J satisfies Lipschitz continuous condition, we have 


||(J- J*Yr*\\ <v\\9-9* 


(23) 


Let A be the smallest eigenvalue of Jjj*, then there exists e A such that rj < A when 
9 E N(9*,e i). Due to Lipschitz continuity of J, there exists e 2 , when 9 G N(9*,e 2 ) such 
that 

|| J T J — JjJ*\\ < k < A — T]. 

Notice || (jJ J*) - 1 1| < —, by Lemma [lj 

A 


|| (J 1 J) x || < y wifi 1 cG (1,-). 

A 7 ] 


(24) 


Take e = minjei, e 2 , 
(k + l)-th step 9 k+ 1 


-}, suppose after k step iterations, || (J k J k )~ l 

caj 

has definition, and 


< —, then the 
A 


h+i-r = e k - e- - 

= -(JZJ k )-VIr t + JZJ t (e'-e k )] (25) 

= - 4(r--r t - MO’ - »*))]• 



Hence by Lemma [21 


( 26 ) 


Ik* - rt -4(0*-4) II < | 

By hypothesis Jfr* = 0 and relation (123|) . 

|| = II(J fc - J*)V|| < 7/||0 fc - 0*|| . (27) 

Synthesizing relations (l24]h (126|) . (l27lh and hypothesis || Jfc|| < a, in the light of formula 
1)25]) . it is readily to get 

Il4+i - 0*|| < ||(4 T 4)- , II(II4’>*II + ||4|||k* ~ Tk - 4(0* -4)11) 

< ^(»)l|0^0*|| + ^'||« l -0*|| 2 ) (28) 

< (f+ ^p)l|0 l -0*|| < ||0i.-0*||, 

that is 

\\6 k+ i - 6*\\ = p\\6 k - 6*\\ with p< 1. 

In the preceding proof, if let k — 0, we immediately obtain the proof for the first step 
of induction conjecture, therefore, according to principle of mathematical induction the 
above proof is right for any k. Whence 

||0*-0*|| =p fc+1 ||0o-0*||, 
when k —> oo, \\9 k — 6**11 —> 0, that is 


lim 9 k = 6 1 *, 

k —^oo 

which indicates that the sequence {9 k } generated by Gauss-Newton Algorithm converges 
to 9*. □ 

Lemma 3 (Affine Invariance of Step). For Gauss-Newton algorithm, iteration step is 
independent on affine transformation. 

Proof. Let U be an n x n nonsingular matrix, define = f(U(f>), then we have 

V/M = 4 T V/(0), 

v 2 /(k) = U T V 2 f(8)U , (29) 

6 = Uo . 

Then the step of Gauss-Newton algorithm can be expressed as 

Scf = -[U T V>f{9)U]-\U T Vf{9)] , 

= U~ l 59 , 


( 30 ) 



that is the step is independent on affine transformation. Especially for the least square 
model, the corresponding transformation has the following form 


V[J T r](0) = U T [J T r}(9) , 

V 2 [J t J]W = U T [^.J](0)U , (31) 

<p = u-‘e , 

by virtue of which all requirements in Theorem Q] are satisfied for the transformed variables 
and the proof of Theorem [T] remains valid, so 

lim cf) k = 0*, with (j)* = U~ l 9* . (32) 

k —^oo 

□ 


Remark In the light of Theorem [I] Gauss-Newton algorithm is merely related to the 
reduced Hesse matrix H(9), the corresponding error estimation is also merely related to 
H{9 ), therefore from the conventional definition of covariance matrix (denoted by V), it 
reads 

V~ 1 (9,x) = H(9,x). (33) 

The index x is recovered henceforth. By formulas (fT5l) . (fT9j) . and (1201) . the Hessian element 
is expressed as 

xi dai dai 


H ij (9,x) = Le-J2 


i=i 


ai 89 i 89 j 


Define H(9,x ) = LeH(9,x), recast H(9,x) as 

H(9,x) = AYA t , 

where 


A _ at _ Y — — 0 

y do,■ ' « m, ’ « a,' y ' 


(34) 


(35) 


(36) 


Since there is only a constant (Le) difference between H and H , the latter is often adopted 
in the following discussions. However, the conclusions are simultaneously applicable for 
the former. 

By virtue of Lemma [3], affine transformation will not change optimization result, so 
it is useful to investigate the property of H under special affine coordinate, which will 
disclose some peculiar features of optimization. 


Theorem 2 (Independence of Optimal Parameters). For each parameter, there exists an 
affine transformation under which a parameter is relatively independent of the others. 




Proof. Since H is a symmetric positive definite matrix, according to Cholesky decompo¬ 
sition theorem, there exists an affine transformation, such that 

H = LDL t , (37) 

where D is a diagonal matrix, L is a unit lower triangle matrix, the inverse of which, 
T = L~ 1 , is also a unit lower triangle matrix. Whence 

D = FHT t . (38) 


By Lemma [3l this affine transformation is equivalent to introduce a new variable 0, with 
the relation 


<t = (r T )-'9. 


(39) 


Since D is diagonal, this indicates that every element of </>, i.e. = 1,2,--- ,n), is 

independent of the other elements. Note that V T is a unit upper triangle matrix, its 
inverse is also a unit upper triangle matrix that has the form 


by which we have 


^ 1 f 21 



tnl ^ 

tnl 


V 1 / 

0n 9 n • 


Since the order of parameter, that is the index of parameter, has not absolutely assigned 
meaning, the above statements indicate that for any parameter 6i (i = 1,2,--- , n) there 
exists an affine transformation such that is independent of the others. □ 


Remark Since we can always have <fi — Q t , this theorem indicates that from the point 
view of optimization process, we can directly utilize original parameters instead of actually 
performing an affine transformation. In addition, the “independence conjecture” adopted 
in Sect. 13.2.11 now becomes the bona fide legitimate theorem. 

Theorem 3 (Uniqueness of Minimum). As far as all taken scan points are concerned, 
there is only one point which leads to the smallest error for certain parameter. 


Proof. Firstly introduce an auxiliary function g with definition 


m = - 


1 dai dai 


ai 89 i 86 j 


(40) 


Here by Theorem [2j without loss of generality, only consider parameter 9\ and keep the 
other parameters invariant, then g is simplified as 







and H becomes 

m 

H{B u x) = £>^(0!) • (42) 

1=1 

Now denote g min = mm{g l n ,l = 1,2, ••• ,m} and g max = ma x{g l n ,l = 1,2, ••• ,m}, 

rri 

notice J)) xi — 1 , it is easy to see 

i=i 


9min — 


H = 



(43) 


Notice V oc if -1 , from above relation, when H reaches the maximizer V reaches the 
minimizer. Therefore, if g\ 1 {6 1 ) exists only one maximum value on all scan points (for 
l = 1 , 2 , ■ , m), then when all luminosity is congregated at the point where g = g ma x, 

the minimum error can be obtained for parameter 9\. □ 


Remark Generally speaking, for scan experiments, the cross section cq is a function of 
center-of-mass energy E (denoted as E cm in Sect. ED, each index l actually corresponds 
to an energy Ei , that is 07 = a Er However, the Ej is a continuous variable and can be 
denoted directly as E, so the cross section reads a(9; E). Correspondingly g is a function 
of E, that is 


g(0-,E) = g? 1 (0) 

where 07 fa a El = a(9; E) is adopted, 
setting dg{9 ; E)/dE = 0. 


*(0;E) 


da(9;E ) 
d9 


1 2 


(44) 


So the extremum of g(9;E) can be acquired by 


Theorem 4 (Number Consistency between parameters and scan points). Solely n points 
are needed for an optimal scan scheme in which n parameters are to be determined. The 
energy value of each scan point corresponds to the position where an auxiliary function of 
certain parameter reaches its maximizer. 


Proof. Firstly, we prove the following fact. Considering the summation 

m 

T m = ’ ( 45 ) 

t= 1 

where i denotes a certain parameter and jt an energy point. Without loss of generality, 
assume that j\ < j 2 < ••• < j m , and denote g min = min{gff,t = 1,2, , m} and 

gmax = max{flf , t = 1 , 2, • • • , m}, then 


%gmin — — Zg 


max j 


(46) 


m 

with Y2 Zj t = 2 . Notice the continuous dependence of gj* on j t (as explained in the remark 

4=1 

of Theorem El j t = Ej t and Ej t is a continuous variable, so gf is a continuous function of 






( 47 ) 


Ej t ), therefore it is always possible to find a point j' with j\ < j' < j m so that 

zg 3 ' = T m . 

As shown in Theorem [3j gf is related to uncertainty of parameter. Therefore, above fact 
indicates that the uncertainty effects due to m points can be realized solely at one point. 
Next considering H. 

Since H is a symmetric matrix, there exists an orthogonal matrix U such that 


U t HU = D , 


(48) 


where D is a diagonal matrix. It is assumed that for n parameters in H-space, n optimal 
points have been figured out according to Theorem [3] Although these optimal positions 
of points may be distinct from those of H , by virtue of Lemma [31 two kinds of positions 
are of one-to-one correspondence. We investigate any one of diagonal elements of D, say 
D i: which has the form 

n 

D i = Y^ x i9 l i + val , (49) 


i=i 


where Yh x i + y = 1, and j j-1,(1 = 1, 2, ■ • ■ ,n); l denotes optimal point, so y indicates 

i=i 

the luminosity allocated outside the optimal points; g\ is defined as 


i _ 1 / dai\ 2 

9 i ~a l \d<j>J ' 

n 

Then we try to find out an allocation ^yi = y, such that for each parameter 

i=i 


(50) 


5^ Vis‘i > VI; • 


1=1 


or for all diagonal elements of D 

with 

G = 


GY > Y g , 


( y\ ai 
g\ gl 


\ 9n 9n 


(51) 


(52) 


g i 

92 


\ 


9n 


Y = (yi, 2 / 2 , • • • , y n ) T , and Y g = ( yg{ , yg> 2 , • • • , yg{) T . 

If Eq. (|32|) is satisfied, this indicates that the allocation of luminosity at optimal points 
leads to increase of Di. Notice that V) oc D ^ 1 , therefore, this allocation of luminosity leads 
to decrease of error. Moreover, note 

D~ l = 


(53) 




since orthogonal transformation keeps the trace of matrix invariant, while the trace of 
H -1 is proportional to the sum of squared errors, the previous statements indicate that 
the luminosity allocated at n optimal points can guarantee the optimality of results for 
determining n parameters. Next we prove Eq. ([52]) . 

Firstly, introduce new variables 


ft. = £ 
Pl 9i 


Zi = 


Vi 

y 




Since for the parameter i , the optimal point is at the position i, g\ > g\ (k — l or j ) and 
p\ > 1, p\ > p\ > 0 . The new matrix P(t) is defined as 



1 p\ pit ■■■ ^ 

P(t) = 

P \t p\ ■■■ p^t 


\ pit p 2 n t ■■■ pi ) 

or 


[P^pg = P q p [$pg + (1 - Spg)t] 

Here t 6 [0,1]. Then the equivalent equation of Eq. (1521) is 


P(l)Z>I , 


(54) 


with Z = (zi, Z 2 , • • ■ , z n ) T . Considering the case when t — 0 and P(0)Z = I , it imme¬ 
diately gets p\Zi — 1 or Zi — 1/p^ < 1 (since p\ > 1). For most of cases |TU [T3j 9i > 1, 
and g\ should be far from g\, which means p\ 1. Under such a case, we consider the 
condition 


E 



(55) 


n n 

which is equivalent to Yh z i < 1- For z % < 1 case, it always can 

i= 1 2—1 


n n 

make z i — 1- Therefore, it always has P(0)Z > I for Y2 z i = 1. 
2—1 2—1 

increase function on t, so 


increase some z. L to 


Notice P{t)Z is an 


P{l)Z > P(0)Z > I . 


This hnish the recpiired proof. 


□ 


Remark Firstly, the application of Cramer’s theorem immediately yields the solution of 
equation GY = Y g . But such a solution can not guarantee Y > 0. Secondly, conclusions on 
positive linear system can guarantee a positive solution [ 241 [25jj . However, the constraint 

n 

'Yhyi = y makes these conclusions not feasible for the present problem. Thirdly, the 

i=i 


OO 




condition of Eq. (1551) greatly simplifies the proof of Theorem HJ Nevertheless, as from 

n 

above proof, Theorem [I] can be actually applied for some cases when 'Yf 1/p* > 1. The 

i= 1 

applicable degree of inequality depends on the variation character of p\ (k — l or j). 
Anyway, further mathematical study of this problem is beyond the interest of this paper. 


Theorem 5 (Luminosity Distribution Principle). For multi-parameter scan scheme, the 
luminosity allocation among points is relevant to the relative importance between param¬ 
eters, cross section and its derivative to parameter at optimal point. 


Proof. We begin with definition (133|) . i.e. V 1 = H, by formulas (134)) and ([35]) . V 1 can 
be recast as 

E" 1 = AZA t , (56) 

where A is given in Eq. (I55|) . 

Zii = ■ Sij ■ (57) 

We know cL is observed cross section. After the first optimization, we obtained optimal 
parameters and have relation a* = cri(9*) m d t . In the following theoretical analysis, we 
replace di with a*, then 

V = (A T )~ 1 Z~ 1 A~ 1 , (58) 


where 



Lexi 



(59) 


The element of A 1 is represented by a, i.e. aij = A - 1 . The diagonal elements of V is 
the squared error for certain parameter, for the i-th parameter, the squared error reads 


va = 


1 

Le 


n 


■Y. 


a u a 


* 

i 


Xi 


(60) 


For each v l% introduce a weight factor Wi to represent the relative importance of parameter. 

n 

Notice constraint Yh x i = 1, introduce a Lagrange multiplier A, and construct a Lagrange 

i=i 

function as follows 


F(9,x; A) = J2 w i v a +^ [J2 x i ~ 1 

i=i \i=i 

- ±4i 

t=i \ i=i 


n 9 * 

aiaf 




(61) 


1=1 


The hrst derivative of F leads to 


dF 1 (J p I ^ 9 \ 

-J--F2 • 2^ Wia pi + A ; 


dx 


(62) 


. i =1 


oo 







the second derivative of F leads to 


d 2 F 

dx q dx p 


1 2a! 


Le 


xi 


E 

, i =1 


WiCX 


pi 


J pq 


(63) 


Since the Hessian of F only has diagonal elements and all of them are positive, the 
extremnm determined by the first derivative are all minimizers. By setting Eq. (l62|) to 
zero, it is readily to get 

lt* /Eh 

,2 


2 _ 

” ~ Le A 



(64) 


on the strength of which the luminosity allocation among different optimal points can be 
obtained. □ 


Remark The weight factor tty is determined usually according to antecedent experience. 
Take m T scan as an example, as to parameter 1 (m T ), 2 (e), and 3 (a bg\ the corresponding 
weight is set to be W\ — 1, w 2 = w 3 — 0, which indicates that only the uncertainty of m T 
is cared about. Under such a condition, Theorem [5] yields the following relation 

x 1 :x 2 : x 3 = (anv^) : faiy/vi) ■ (a3i\/^f) ■ ( 65 ) 

The cross sections and their derivatives are calculated respectively at energy points [12] E\ 
= 3.5538 GeV, E 2 = 3.595 GeV, and E 3 = 3.50 GeV. The optimal fraction of luminosity 
at these points are x\ = 70.0%, x 2 = 21.8%, and x 3 = 8.2%. Comparing with the results 
by the sampling technique (refer to Ref. [12] or Sect. E]) X\ — 67.5%, x 2 = 22.5%, and x 3 
= 10.0%, two sets of results are consistent with each other fairly well. In the calculation, 
some relevant values are kept the same as those used in Ref. [12]: m T = 1.77699 GeV, 
B e p = 0.06194, e = 14.2%, and a BG =0.024 pb. 


5 Discussion 


5.1 Equivalence between likelihood and chisquare fits 


We start from likelihood estimator (refer to Eq. (]9]) b 

- A - u Ni p-h 

LF ^n^. 

i=i 1 


( 66 ) 


where Ni is the number of observed events at Z-th scan point, and N t is assumed following 
a Poisson distribution with expectation /q. To hnd the maximum of likelihood function 
equals to hnd the minimum of function / defined as 


/ = - In LF 


E ln 


l=i 


N,\ 


0/1 


(67) 







With definition 


f _ 

Jl m ’ 

the second order derivative of function / reads 

ay = A r_ 1 a 2 /i ' 

mdOj ^[{fidej\f,dej f,ae.de, ' 

After a little algebra, this equation is reduced to 

d 2 f = A rA^d/yd/p + d 2 [n } 

d6id6j j-f { /if dOi dOj V Ah ) dOidOj J 


( 68 ) 


(69) 


Notice that for Poisson distribution the expectation of (A) — /p) 2 is /q, for large A) we 
take approximation (A) — /p) 2 « /p, so it is easy to see 


f—- — 1 | ~-, for large Ni . 

J VJdl 

In addition, utilizing relation p; = L/cq (a denotes the theoretical cross section), A) = L/cq 
(cr denotes the observed cross section), and after the first optimization, taking approxi¬ 
mation cq ~ cq, Eq. (l69j) can be recast as 


d 2 f _ A j N t dfii dni l~Ni d 2 Hi 1 

dOjdOj 2-J | (j2 QQ^ QQ, Y q- 2 QQ.QQ. j 

Since cr is the physics quantity which keeps invariant for a definite process, therefore when 
A) is large enough, it always satisfies 


N, 


erf 


> 



This indicates that comparing with the first term of Eq. (1701) . the second term can be 
neglected, which leads to 


a 2 / 

dOjdOj 



(71) 


Here relation A) = eLaqcq is adopted. Comparing with Eq. (|34l) . 


dQjdO, 


is just the element 


of Hesse matrix ( Hij ). Moreover, both likelihood and chisquare estimators have the same 
form of gradient relevant to cq, therefore Gauss-Newton Algorithm can be executed for 
likelihood estimator exactly the same way as that for chisquare. On this extent, it is 
reasonable to claim that the first optimization processes for both likelihood and chisquare 
fits are equivalent. 

















As a matter of fact, we could view the equivalence between likelihood and chisquare 
fit from another viewpoint. Notice for large N, Poisson distribution approximates Gauss 
distribution, i.e. 


ji N e M 


N—>oo 

- y 


( N-»y 


N\ -\Z2tiEn 

where Em is expectation of N , if we take Em « N, then 


-'N 


(72) 


LF=n 


1=1 


/rfV w 

Nil 


N —^oo 


"-[f 


y/2nNi 


2N t 


(73) 


Whence 


/ = - In LF 


1 (.Ni — /q) 2 

2^ A) 

i=i 1 


1 . ^ 

+ 2E ln ( M ')- 

z 1=1 


(74) 


In optimization process, the second term as a constant can be neglected, so / becomes 


/ = — In LF 



(N t - /q) 2 

N 


(75) 


so except for a factor 1/2, this is the chisquare form in Eq. (jT]) , which is adopted from 
the very beginning of study. 


5.2 Effect due to systematic uncertainty 


In the light of study of m T scan, the uncertainty due to energy calibration dominates 
over the others [26) [2?]. Some special techniques have been adopted to decrease such an 
uncertainty. For example, Compton backscattering technique is utilizing to establish beam 
energy measurement system at KEDR [28] and BES [2B, EH], to increase the accuracy of 
beam energy at the level of 10” 4 or better. 

There is a concise way to taking into account of such a kind of uncertainty. We begin 
with chisquare formula (JT]) 



where N denotes the number of observed events, N the number of theoretical estimated 
events, A the error of N, then according to Refs. [31, [32, 33], the effect of uncertainty 
due to energy E will taken into account by a new chi-square form 



(77) 


where 


A 2 = A 2 + 

dN 

■ A e■ 

L L 1 

dE 

E=Ei 


n 2 


( 78 ) 





















Notice = Ni, and iVj(iVj) = L^a^a,), with following definitions 



Eq. fl78j) is recast as 



( 79 ) 


with definition 




Oi + £j(ffB,) 2 i5|. . 


(80) 


From the proofs of theorems in Section [H the change from a to cr will affect Theorem 
[3] and Theorem [5] As far as Theorem [3] is concerned, if cr* weekly depends on Ei, or cf* is 
a smooth function of E i} the extremum of g(E) determined by condition dg(E)/dE = 0 
remains almost the same as before. As far as Theorem 0 is concerned, if cr* is replaced 
by d*, the corresponding luminosity allocation can be obtained. 

5.3 Correlation issue 

Correlation due to systematic uncertainty in scan experiment is always an annoying 
problem. A so-called scale factor method was used to deal with correlated data [33]. The 
application of such a method in scan experiment is explored in details in Refs. [35], 36, 37j. 
The general idea is to introduce a factor corresponding to the correlating uncertainty, so 
that the factor can be treated as an independent measurement variable. Therefore, the 
method depicted in the preceding section for independent systematic uncertainty can be 
adopted. However, more special study is needed for such a case. 

5.4 Optimization issue 

In developing theory of second optimization, many fine analytical properties for the 
objective function have been assumed in order to make the first optimization feasible, as 
it is stated in Sect. [4j Especially in the proof of Theorem [3] the auxiliary function is 
required to have only one maximum in scan region. In fact, if the auxiliary function have 
several same maxima, any one of them is equivalently good for parameter determination. 
So for this case, one point is enough as well. 

However, if we come across the case where the parameters contained in the objective 
function have multiple solution [38, 39,40| 0T] , optimization procedure can only be applied 
for one set of parameters. For the general case involving all sets of parameters, it is a 
topic for the further investigation. 

In Sect. [4[ Gauss-Newton algorithm is adopted for the first optimization, which is a 
universally utilized method and has very good properties. Especially, affine invariance of 




step makes the proof of Theorem [2] (the theorem of independence of optimal parameters) 
feasible and easy. As far as many other algorithms are concerned, theory of second 
optimization should be considered and studied separately. 

5.5 Sampling method and analytical theory 

Last but no least, we will say few words about the sampling method and the analytical 
theory. It is obviously the former provides the important clue and implication before the 
latter, and accommodate confirmation after the latter. Moreover, the sampling method 
can perform study on rather more complex and unknown cases which could not be settled 
by the present analytical theory. 

As far as analytical theory is concerned, its merit is prominent. As long as the con¬ 
clusion is proved mathematically, the relevant issue can be fixed finally. Just as it have 
been shown, the analytical theory can provide robust and optimal scheme design for scan 
experiment. Moreover, even if some more generalized and more complex conclusions could 
not be proved temporarily, proved conclusions can provide us much clues for further explo¬ 
ration. It is evident that two approaches are complementary and of paramount important 
in developing theory of second optimization for scan experiment. 

6 Summary 

In this paper, the sampling technique and the analytic analysis are adopted for multi- 
parameter optimization fitting involving scan data. The conclusions drawn from two 
approaches are consistent with each other just as expected, that is 

1. For n parameters scan experiment, n energy points are necessary and sufficient for 
optimal determination of these n parameters; 

2. Each optimal position can be acquired by single parameter scan (sampling method), 
or resort to the analysis of auxiliary function (analytic theory); 

3. The luminosity allocation among points can be determined analytically, which is 
relevant to relative importance between parameters, cross section and its derivative 
to parameter at optimal point. 

Theory of second optimization for scan experiment established in this paper can pro¬ 
vide the state of art scheme for scan experiments that aim at accurate measurements of 
interesting parameters. 
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Appendix A 

As to the mathematical details involved in this paper, it can be referred to Refs. [42], 
[43], 03, 05]) and 06j. Complied here are some materials from various areas of math¬ 
ematics that are used or supposed to be satisfied for the proofs in Sect. EH 

OO 

Proposition 1. If an n x n matrix E satisfies ||.E , || < l, then E E k = (/- E)- 1 . 

k =0 

Theorem 2. If A e R nxn i s a symmetric matrix, there exists an orthogonal matrix U 
such that A = U T AU is a diagonal matrix. 

Proposition 3. The orthogonal transformation keeps the trace of matrix invariant. 

Proposition 4. The inverse of unit upper (lower) triangle matrix is the unit upper (lower) 
triangle matrix as well. 

Proposition 5. An n x n matrix E is invertible (also nonsingular or non-degenerate) if 
and only if its determinant is not equal to zero. 

Theorem 6 (Cramer theorem). If A e .R nxn i s an n x n nonsingular matrix and if 
Y = ( 2 / 1 , 2 / 2 , • • • ,Vn) T G R n , then the system of linear equations AX = Y has the unique 
solution or X = (. Xi,x 2 , ,x n ) T G R n in which, for each 1 < j < n, we have ay = 
|A| _1 |A(j)| ? where At^ is the matrix formed from A by replacing the i-th column of A by 

Y. 

Theorem 7 (Cholesky decomposition theorem). If A & R nxn is a symmetric positive 
definite matrix, there exists a real lower triangle matrix L such that A = LL T or A = 
LDL t , where D is the diagonal matrix, and L is the unit lower triangle matrix. 

Theorem 8. Let f : R n —> R 1 has continuous second partial derivatives in an open convex 
set SCR 71 . Then 

1. f is convex in S if and only if the Hesse matrix G of f is positive semi-definite in 

S; 


2. f is strictly convex in S if G is positive definite in S, but the converse is not in 
general true. 


Theorem 9. If 

1. f : RT —>• R 1 is strictly convex in the convex set S; 

2. f has continuous first partial derivatives in S; 


3. 9* is a critical point of f in S, 


then 9* is strong global minimizer of f over S. 


Proposition 10 (Lipschitz continuity). Given two metric spaces (X,dx) and (Y,d Y ), 
where d\ denotes the metric on the set X and d Y is the metric on set Y (for example, 
the metric d x (xi,x 2 ) = ||xi — x 2 \\), a function f : X —> Y is called Lipschitz continuous 
if there exists a real constant 7 > 0 such that, for all x\ and x 2 in X, d Y (f(xi)J(x 2 )) < 
jd x (x 1 ,x 2 ). Any such 7 is referred to as a Lipschitz constant for the function f. 


Theorem 11 (Intermediate value theorem). Consider an interval I — [a, b\ C R and 
a continuous function f : I R. Then if u is a number between f(a) and f(b), and 
f(a) < u < f(b) or f(a) > u > f{b), then there is a c G [a, b] such that f(c) = u. 


Theorem 12 (Sufficient condition for existence of extremum). Suppose thatx 0 = ( x 5, x^, ■ ■ ■ , x° n ) 
is the stable point of function y = f(x) = f(x 1 , x 2 , ■ ■ ■ , x n ), moreover, in the neighborhood 
of the stable point x°, function f(x) has definition, continuous, and has the continuous 
first and second partial derivatives. Introduce a symbol 


y* 




d k y 


dx p fi , dx p fi , • • • , dx p T 


n 

k = 

i= 1 


the superscript 0 indicates that the partial derivatives are calculated at point x°. Define 
the determinant D t as 



7/° 

y x\ 

UxiX2 

VxiXi 


A: = 

9x2X1 

n ,° 

y. r 2 

... 

0X2 Xi 

5 


VxiX 1 

y0 

tfXiX 2 

n ,° 

9x1 


For n variables, the n determinants Di, D 2 , • • • 

, D n are calculated in turn, then 


1. The sufficient condition for the stable point x° to be the minimizer is that all deter¬ 
minants are positive, that is 

Di > 0,i = 1,2, • • • , n: 


on 



2. The sufficient condition for the stable point x° to be the maximizer is that all even 
determinants are positive and all odd determinants are negative, that is 

A > 0, i = 1, 3, 5, • • • , 

A < 0, i = 2,4, 6, • • • . 

If above two conditions are not satisfied, then the stable point may be not the extremum 
point. If all Di are zero, the higher order of derivative has to be considered. 
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